Overview

Dataset Statistics

Number of Variables 6
Number of Rows 1.0002e+06
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 152.8 MB
Average Row Size in Memory 160.2 B
Variable Types
  • Numerical: 4
  • Categorical: 2

Dataset Insights

index is uniformly distributed Uniform
geolocation_lat is skewed Skewed
geolocation_lng is skewed Skewed
geolocation_city has a high cardinality: 8011 distinct values High Cardinality
geolocation_state has constant length 2 Constant Length
geolocation_lat has 998827 (99.87%) negatives Negatives
geolocation_lng has 1000160 (100.0%) negatives Negatives

Variables


index

numerical

Approximate Distinct Count 1000163
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 16002608
Mean 500081
Minimum 0
Maximum 1000162
Zeros 1
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • index is uniformly distributed

Quantile Statistics

Minimum 0
5-th Percentile 45007.29
Q1 245039.69
Median 495080.19
Q3 745121.2
95-th Percentile 945153.2
Maximum 1000162
Range 1000162
IQR 500081.51

Descriptive Statistics

Mean 500081
Standard Deviation 288722.333
Variance 8.3361e+10
Sum 5.0016e+11
Skewness -2.0019e-16
Kurtosis -1.2
Coefficient of Variation 0.5774
  • index is not normally distributed (p-value 0.003171129119923)

geolocation_zip_code_prefix

numerical

Approximate Distinct Count 19015
Approximate Unique (%) 1.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 16002608
Mean 36574.1665
Minimum 1001
Maximum 99990
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • geolocation_zip_code_prefix is skewed right (γ1 = 0.6945)

Quantile Statistics

Minimum 1001
5-th Percentile 3110
Q1 9961
Median 26510
Q3 61700
95-th Percentile 90820
Maximum 99990
Range 98989
IQR 51739

Descriptive Statistics

Mean 36574.1665
Standard Deviation 30549.3357
Variance 9.3326e+08
Sum 3.658e+10
Skewness 0.6945
Kurtosis -0.9412
Coefficient of Variation 0.8353
  • geolocation_zip_code_prefix is not normally distributed (p-value 4.2825013604239084e-05)

geolocation_lat

numerical

Approximate Distinct Count 717360
Approximate Unique (%) 71.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 16002608
Mean -21.1762
Minimum -36.6054
Maximum 45.0659
Zeros 0
Zeros (%) 0.0%
Negatives 998827
Negatives (%) 99.9%
  • geolocation_lat is skewed right (γ1 = 1.5651)

Quantile Statistics

Minimum -36.6054
5-th Percentile -28.9757
Q1 -23.5987
Median -22.9215
Q3 -19.9803
95-th Percentile -7.6771
Maximum 45.0659
Range 81.6713
IQR 3.6183

Descriptive Statistics

Mean -21.1762
Standard Deviation 5.7159
Variance 32.6711
Sum -2.118e+07
Skewness 1.5651
Kurtosis 2.8501
Coefficient of Variation -0.2699
  • geolocation_lat is not normally distributed (p-value 2.836140269515105e-17)
  • geolocation_lat has 168412 outliers

geolocation_lng

numerical

Approximate Distinct Count 717613
Approximate Unique (%) 71.8%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 16002608
Mean -46.3905
Minimum -101.4668
Maximum 121.1054
Zeros 0
Zeros (%) 0.0%
Negatives 1000160
Negatives (%) 100.0%
  • geolocation_lng is skewed left (γ1 = -0.1024)

Quantile Statistics

Minimum -101.4668
5-th Percentile -53.2203
Q1 -48.527
Median -46.6362
Q3 -43.8429
95-th Percentile -38.5044
Maximum 121.1054
Range 222.5722
IQR 4.6841

Descriptive Statistics

Mean -46.3905
Standard Deviation 4.2697
Variance 18.2308
Sum -4.6398e+07
Skewness -0.1024
Kurtosis 4.727
Coefficient of Variation -0.09204
  • geolocation_lng is not normally distributed (p-value 3.6418982866942177e-19)
  • geolocation_lng has 44383 outliers

geolocation_city

categorical

Approximate Distinct Count 8011
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory Size 77804186
  • The largest value (sao paulo) is over 2.19 times larger than the second largest value (rio de janeiro)

Length

Mean 10.4683
Standard Deviation 4.0987
Median 9
Minimum 2
Maximum 38

Sample

1st row sao paulo
2nd row sao paulo
3rd row sao paulo
4th row sao paulo
5th row sao paulo

Letter

Count 9607367
Lowercase Letter 9607367
Space Separator 778303
Uppercase Letter 0
Dash Punctuation 2433
Decimal Number 17
  • geolocation_city contains many words: 5566 words

geolocation_state

categorical

Approximate Distinct Count 27
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 67010921
  • The largest value (SP) is over 3.2 times larger than the second largest value (MG)

Length

Mean 2
Standard Deviation 0
Median 2
Minimum 2
Maximum 2

Sample

1st row SP
2nd row SP
3rd row SP
4th row SP
5th row SP

Letter

Count 2000326
Lowercase Letter 0
Space Separator 0
Uppercase Letter 2000326
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (SP, MG) take over 50.0%
  • The largest value (sp) is over 3.2 times larger than the second largest value (mg)
  • geolocation_state has words of constant length

Interactions

Correlations

Missing Values